Optimizing Transformations of Stencil Operations for Parallel Cache-based Architectures

نویسندگان

  • Federico Bassetti
  • Kei Davis
چکیده

This paper describes a new technique for optimizing serial and parallel stencil-and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicitly in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its speci-city it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the beneets of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation and applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D and 2-D tiling for a single processor. For the parallel case both blocking and non-blocking communication have been tested. However, the parallel case is not discussed here.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Transformations of Stencil Operations for Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures

High-performance scientific computing relies increasingly on high-level large-scale object-oriented software frameworks to manage both algorithmic complexity and the complexities of parallelism: distributed data management, process management, inter-process communication, and load balancing. This encapsulation of data management, together with the prescribed semantics of a typical fundamental c...

متن کامل

Domain-Specific Optimization of Two Jacobi Smoother Kernels and Their Evaluation in the ECM Performance Model

Our aim is to apply program transformations to stencil codes in order to yield the highest possible performance. We recognize memory bandwidth as a major limitation in stencil code performance. We conducted a study in which we applied optimizing transformations to two Jacobi smoother kernels: one 3D 1st-order 7-point stencil and one 3D 3rd-order 19-point stencil. To obtain high performance, the...

متن کامل

Loop Transformations for Performance and Message Latency Hiding in Parallel Object-oriented Frameworks (extended Abstract)

Application codes reliably achieve performance far less than the advertised capabilities of existing architectures, and this problem is worsening with increasingly-parallel machines. For large-scale numerical applications, stencil operations are often impose the greater part of the computational cost, and the primary sources of ineeciency are the costs of message passing and poor cache utilizat...

متن کامل

Improving Scalability with Loop Transformations and

Application codes reliably achieve performance far less than the advertised capabilities of existing architectures, and this problem is worsening with increasingly-parallel machines. For large-scale numerical applications, stencil operations are often impose the greater part of the computational cost, and the primary sources of ineeciency are the costs of message passing and poor cache utilizat...

متن کامل

In-Core Optimization of High-Order Stencil Computations

In this paper, we apply in-core optimization techniques to high-order stencil computations, including: (1) cache blocking for efficient L2 cache use; (2) register blocking and data-level parallelism via single-instruction multipledata (SIMD) techniques to increase L1 cache efficiency; and (3) software prefetching techniques. Our generic approach is tested with a kernel extracted from a 6 th -or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999